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1 Multiobjective Optimization in Bioinformatics and Computational Biology 
Julia Handl, Douglas B. Kell, Joshua Knowles 

April 2007 IEEE/ACM Transactions on Computational Biology and Bioinformatics 

(TCBB), Volume 4 Issue 2 
Publisher: IEEE Computer Society Press 

Full text available: * ^pdf(710.18 KB) Additional Information: full citation, abst ra ct, in de x ter m s 

This* paper reviews the application of multiobjective optimization in the fields of 
bioinformatics and computational biology. A survey of existing work, organized by 
application area, forms the main body of the review, following an introduction to the key 
concepts in multiobjective optimization. An original contribution of the review is the 
identification of five distinct "contexts," giving rise to multiple objectives: These are used 
to explain the reasons behind the use of multiobjective o ... 

Keywords: Global optimization, clustering, classification and association rules, interactive 
data exploration and discovery, experimental design, machine learning, bioinformatics 
(genome or protein) databases. 



2 Research track papers: Adaptive event detection with time-varying poisson 
H> processes 

Alexander Ihler, Jon Hutchins, Padhraic Smyth 

August 2006 Proceedings of the 12th ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '06 

Publisher: ACM Press 

Full text available:^ pdf(1. 22 MB) Additional Information: fu ll cit ation, abstract, references, ind e x t er ms 

Time-series of count data are generated in many different contexts, such as web access 
logging, freeway traffic monitoring, and security logs associated with buildings. Since this 
data measures the aggregated behavior of individual human beings, it typically exhibits a 
periodicity in time on a number of scales (daily, weekly, etc.) that reflects the rhythms of 
the underlying human activity and makes the data appear non-homogeneous. At the same 
time, the data is often corrupted by a number of burs ... 

Keywords: Markov modulated, event detection, poisson 



3 Analyzing Gene Expression Time-Courses 

Alexander Schliep, Ivan G. Costa, Christine Steinhoff, Alexander Schonhuth 

July 2005 IEEE/ACM Transactions on Computational Biology and Bioinformatics 
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(TCBB), Volume 2 Issue 3 

Publisher: IEEE Computer Society Press 

Full text available: ^ pdf(1,33MB) Additional Information: full citation, abs tract, references, index t erms 

Measuring gene expression overtime can provide important insights into basic cellular 
processes. Identifying groups of genes with similar expression time-courses is a crucial 
first step in the analysis. As biologically relevant groups frequently overlap, due to genes 
having several distinct roles in those cellular processes, this is a difficult problem for 
classical clustering methods. We use a mixture model to circumvent this principal problem, 
with hidden Markov models (HMMs) as effective and ... 

Keywords: Index Terms- Mixture modeling, hidden Markov models, partially supervised 
learning, gene expression, time-course analysis. 



A survey on wavelet applications in data mining 
Tao Li, Qi Li, Shenghuo Zhu, Mitsunori Ogihara 

December 2002 ACM SIGKDD Explorations Newsletter Volume 4 issue 2 
Publisher: ACM Press 

Full text available: ^ pdf(330.06 KB) Additional Information: full citation , abstract , references , citings 

Recently there has been significant development in the use of wavelet methods in various 
data mining processes. However, there has been written no comprehensive survey 
available on the topic. The goal of this is paper to fill the void. First, the paper presents a 
high-level data-mining framework that reduces the overall process into smaller 
components. Then applications of wavelets for each component are reviewd. The paper 
concludes by discussing the impact of wavelets on data mining research an ... 

Combining Sequence and Time Series Expression Data to Learn Transcriptional 
Modules 

Anshul Kundaje, Manuel Middendorf, Feng Gao, Chris Wiggins, Christina Leslie 

July 2005 IEEE/ACM Transactions on Computational Biology and Bioinformatics 

(TCBB), Volume 2 Issue 3 
Publisher: IEEE Computer Society Press 

Full text available: ^ pdf(581 .69 KB) Additional Information: full c it ation , ab s tra ct, references, index terms 

Our goal is to cluster genes into transcriptional modulesdsets of genes where similarity in 
expression is explained by common regulatory mechanisms at the transcriptional level. We 
want to learn modules from both time series gene expression data and genome-wide motif 
data that are now readily available for organisms such as S. cereviseae as a result of prior 
computational studies or experimental results. We present a generative probabilistic model 
for combining regulatory sequence and time serie ... 

Keywords: Index Terms- Gene regulation, clustering, heterogeneous data. 



Research track papers: Clustering time series from ARMA models with clipped data | 
A. J. Bagnall, G. J. Janacek 

August 2004 Proceedings of the tenth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '04 

Publisher: ACM Press 

Full text available: ^ pdf(305.69 KB) Additional Information: full citation , abstract , references , index terms 

Clustering time series is a problem that has applications in a wide variety of fields, and has 
recently attracted a large amount of research. In this paper we focus on clustering data 
derived from Autoregressive Moving Average (ARMA) models using k-means and k- 
medoids algorithms with the Euclidean distance between estimated model parameters. We 
justify our choice of clustering technique and distance metric by reproducing results 
obtained in related research. Our research aim is to assess the aff ... 

Keywords: ARMA, clustering, time series 
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7 Predictive call admission control for all-IP wireless and mobile networks 
Kelvin L. Dias, Stenio F. L. Fernandes, Djamel F. H. Sadok 

October 2003 Proceedings of the 2003 IFIP/ACM Latin America conference on 
Towards a Latin American agenda for network research LANC '03 

Publisher: ACM Press 

Full text available: ^pdf(264.80 KB) Additional Information: full citation , abstract , references, index t e r ms 

This paper proposes a novel call admission control (CAC) scheme for wireless and mobile 
networks. Our proposal avoids per-user reservation signaling overhead and takes into 
account the expected bandwidth to be used by calls handed off from neighboring cells 
based only on local information stored into the current cell where user is seeking 
admission. To this end, we propose the use of two time series-based models for predicting 
handoff load: the Trigg and Leach (TL), which is an adaptive expon ... 

Keywords: all-IP wireless and mobile networks, call admission control, quality of service, 
scalability, time series analysis 



8 Research track papers; Detecting anomalous records in categorical datasets Q 




Kaustav Das, Jeff Schneider 

August 2007 Proceedings of the 13th ACM SIGKDD international conference on 



Knowledge discovery and data mining KDD '07 

Publisher: ACM Press 

Full text available: ^ pdf(834. 19 KB) Additional Information: full citation , abstract , references , index terms 

We consider the problem of detecting anomalies in high aritycategorical datasets. In most 
applications, anomalies are defined as datapoints that are "abnormal". Quite often we 
have access to data which consists mostly of normal records, a long with a small 
percentage of unlabelled anomalous records. We are interested in the problem of 
unsupervised anomaly detection, where we use the unlabelled data for training, and detect 
records that do not follow the definition of normality. 



Keywords: anomaly detection, machine learning 

9 What's Strange About Recent Events (WSARE): An Algorithm for the Early Detection Q 
of Disease Outbreaks 

Weng-Keen Wong, Andrew Moore, Gregory Cooper, Michael Wagner 
December 2005 The Journal of Machine Learning Research, Volume 6 
Publisher: MIT Press 

Full text available: ^ pdf( 3 41 72 KB) Additional Information: full c it ation, a b stract 

Traditional biosurveillance algorithms detect disease outbreaks by looking for peaks in a 
univariate time series of health-care data. Current health-care surveillance data, however, 
are no longer simply univariate data streams. Instead, a wealth of spatial, temporal, 
demographic and symptomatic information is available. We present an early disease 
outbreak detection algorithm called What's Strange About Recent Events (WSARE), which 
uses a multivariate approach to improve its timeliness of detect ... 

10 A unified framework for model-based clustering Q 
Shi Zhong, Joydeep Ghosh 

December 2003 The Journal of Machine Learning Research, volume 4 
Publisher: MIT Press 

Full text available: ^ | pdf(851.48 KB) Additional Information: full citation , abstract , citings , index terms 
Model-based clustering techniques have been widely used and have shown promising 
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results in many applications involving complex data. This paper presents a unified 
framework for probabilistic model-based clustering based on a bipartite graph view of data 
and models that highlights the commonalities and differences among existing model-based 
clustering algorithms. In this view, clusters are represented as probabilistic models in a 
model space that is conceptually separate from the data space. For ... 

11 A data-driven approach to quantifying natural human motion Q 




Liu Ren, Alton Patrick, Alexei A. Efros, Jessica K. Hodgins, James M. Rehg 

July 2005 ACM Transactions on Graphics (TOG) , ACM SIGGRAPH 2005 Papers 



SIGGRAPH '05, Volume 24 Issue 3 
Publisher: ACM Press 

Full text available: *g| pdf(409.67 KB) Additional Information: full citation, abstract, references, citings, i nde x 
Q mov(28:43 MIN) terms 

In this paper/ we investigate whether it is possible to develop a measure that quantifies 
the naturalness of human motion (as defined by a large database). Such a measure might 
prove useful in verifying that a motion editing operation had not destroyed the naturalness 
of a motion capture clip or that a synthetic motion transition was within the space of those 
seen in natural human motion. We explore the performance of mixture of Gaussians 
(MoG), hidden Markov models (HMM), and switching linear d ... 

Keywords: human animation, machine learning, motion evaluation, natural motion. 



12 Industry/government track paper: An approach to spacecraft anomaly detection 

problem using kernel feature space 
^ Ryohei Fujimaki, Takehisa Yairi, Kazuo Machida 

August 2005 Proceeding of the eleventh ACM SIGKDD international conference on 
Knowledge discovery in data mining KDD '05 

Publisher: ACM Press 

Full text available: ^| pdf(664.07 KB) Additional Information: full citation , abstract , references , index terms 

Development of advanced anomaly detection and failure diagnosis technologies for 
spacecraft is a quite significant issue in the space industry, because the space 
environment is harsh, distant and uncertain. While several modern approaches based on 
qualitative reasoning, expert systems, and probabilistic reasoning have been developed 
recently for this purpose, any of them has a common difficulty in obtaining accurate and 
complete a priori knowledge on the space systems from human experts. ... 

Keywords: anomaly detection, kernel feature space, principal component analysis, 
spacecraft, time series data, von Mises Fisher distribution 



13 Learning methods to combine linguistic indicators: improving aspectual classification Q 
and re ve aling lin guistic ins i ghts 

Eric V. Siegel, Kathleen R. McKeown 

December 2000 Computational Linguistics, Volume 26 issue 4 
Publisher: MIT Press 

Full text available: « n _ |9 

^. PfllO JSLM^^ Additional Information: full citation, abstract, references, citings 
Publisher Site 

Aspectual classification maps verbs to a small set of primitive categories in order to reason 
about time. This classification is necessary for interpreting temporal modifiers and 
assessing temporal relationships, and is therefore a required component for many natural 
language applications. A verb's aspectual category can be predicted by co-occurrence 
frequencies between the verb and certain linguistic modifiers. These frequency measures, 
called linguistic indicators, are chosen by linguistic insi ... 

14 Building Blocks for Variational Bayesian Learning of L a tent Variable Models Q 
Tapani Raiko, Harri Valpola, Markus Harya, Juha Karhunen 

http://portal.acm.org/results.cfm?coll=portal&dl=ACM& 9/4/2007 



Results (page 1): "time series' 1 AND likelihood AND diagnostic AND "unsupervised lear... Page 5 of 6 



May 2007 The Journal of Machine Learning Research, volume 8 
Publisher: MIT Press 

Full text available: Q pdf(487.10 KB) Additional Information: full citation, abstract 

We introduce standardised building blocks designed to be used with variational Bayesian 
learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, 
and delay. A large variety of latent variable models can be constructed from these blocks, 
including nonlinear and variance models, which are lacking from most existing variational 
systems. The introduced blocks are designed to fit together and to yield efficient update 
rules. Practical implementation of various model ... 

15 A multimodal learning interface for grounding spoken language in sensory 

4& perceptions 

v Chen Yu, Dana H. Ballard 

July 2004 ACM Transactions on Applied Perception (TAP), Volume l issue l 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 



Full text available: 

" "~ " ' ' ~" terms 

We present a multimodal interface that learns words from natural interactions with users. 
In light of studies of human language development, the learning system is trained in an 
unsupervised mode in which users perform everyday tasks while providing natural 
language descriptions of their behaviors. The system collects acoustic signals in concert 
with user-centric multisensory information from nonspeech modalities, such as user's 
perspective video, gaze positions, head directions, and hand moveme ... 

Keywords: Multimodal learning, cognitive modeling, multimodal interaction 

16 One-Class Novelty Detection for Seizure Analysis from Intracranial EEG Q 
Andrew B. Gardner, Abba M. Krieger, George Vachtsevanos, Brian Litt 

December 2006 The Journal of Machine Learning Research, Volume 7 

Publisher: MIT Press 

Full text available: ^| pdf(264.01 KB) Additional Information: full citation , abstract 

This paper describes an application of one-class support vector machine (SVM) novelty 
detection for detecting seizures in humans. Our technique maps intracranial 
electroencephalogram (EEG) time series into corresponding novelty sequences by 
classifying short-time, energy-based statistics computed from one-second windows of 
data. We train a classifier on epochs of interictal (normal) EEG. During ictal (seizure) 
epochs of EEG, seizure activity induces distributional changes in feature space tha ... 

17 DMSEC session: MORPHEUS: motif oriented representations to purge hostile events J 
4^ from unlabeled sequences 

^ Gaurav Tandon, Philip Chan, Debasis Mitra 

October 2004 Proceedings of the 2004 ACM workshop on Visualization and data 

mining for computer security VizSEC/DMSEC '04 
Publisher: ACM Press 

Full text available: p df(272.36 KB ) Additional Information: full citation, abstract, references, i nd ex t erms 

Most of the prevalent anomaly detection systems use some training data to build models. 
These models are then utilized to capture any deviations resulting from possible intrusions. 
The efficacy of such systems is highly dependent upon a training data set free of attacks. 
"Clean" or labeled training data is hard to obtain. This paper addresses the very practical 
issue of refinement of unlabeled data to obtain a clean data set which can then train an 
online anomaly detection system. 

Our... 

Keywords: anomaly detection, data cleaning, motifs 
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18 Information retrieval and extraction 2: Efficient topic-based unsupervised name 
^ disambig uation 

^ Yang Song, Jian Huang, Isaac G. Councill, Jia Li, C. Lee Giles 

June 2007 Proceedings of the 2007 conference on Digital libraries JCDL '07 

Publisher: ACM Press 

Full text available: *g| pdf(734.48 KB) Additional Information: full cjtatjon, abstract, rMMMces, indMJerms 

Name ambiguity is a special case of identity uncertainty where one person can be 
referenced by multiple name variations in different situations or even, share the same 
name with other people. In this paper, we focus on the problem of disambiguating person 
names within web pages and scientific documents. We present an efficient and effective 
two-stage approach to disambiguate names. In the first stage, two novel topic-based 
models are proposed by extending two hierarchical Bayesian text models, ... 

Keywords: bayesian models, hierarchical clustering methods, name disambiguation, 
probability analysis, unsupervised machine learning 



19 Object tracking: A survey 
^ Alper Yilmaz, Omar Javed, Mubarak Shah 

N/ December 2006 ACM Computing Surveys (CSUR), volume 38 issue 4 
Publisher: ACM Press 

Full text available: ^ pdf(2.60 MB) Additional Information: full citation , abstract , references , index terms 

The goal of this article is to review the state-of-the-art tracking methods, classify them 
into different categories, and identify new trends. Object tracking, in general, is a 
challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, 
changing appearance patterns of both the object and the scene, nonrigid object structures, 
object-to-object and object-to-scene occlusions, and camera motion. Tracking is usually 
performed in the context of higher-level applicatio ... 

Keywords: Appearance models, contour evolution, feature selection, object detection, 
object representation, point tracking, shape tracking 
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20 Trajectory clustering w ith mixtures of regression models 
Scott Gaffney, Padhraic Smyth 

August 1999 Proceedings of the fifth ACM SIGKDD international conference on 

Knowledge discovery and data mining KDD '99 
Publisher: ACM Press 

Full text available: *Q pdf(1.31 MB) Additional Information: full citation , references , citings , index terms 
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21 Mining anomalies using traffic feature distributions 
Anukool Lakhina, Mark Crovella, Christophe Diot 

August 2005 ACM SIGCOMM Computer Communication Review , Proceedings of the 
2005 conference on Applications, technologies, architectures, and 
protocols for computer communications SIGCOMM '05, volume 35 issue 4 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings, index 
terms 



Full text available: TO pdf(323.63 KB ) 



The increasing practicality of large-scale flow capture makes it possible to conceive of 
traffic analysis methods that detect and identify a large and diverse set of anomalies. 
However the challenge of effectively analyzing this massive data source for anomaly 
diagnosis is as yet unmet. We argue that the distributions of packet features (IP addresses 
and ports) observed in flow traces reveals both the presence and the structure of a wide 
range of anomalies. Using entropy as a summarization tool, ... 

Keywords: anomaly classification, anomaly detection, network-wide traffic analysis 



22 The effects of lexical specialization on the growth curve of the vocabulary 
R. Harald Baayen 

December 1996 Computational Linguistics, Volume 22 issue 4 
Publisher: MIT Press 

Full text available: ^ .... e _ |f| 

Tg]pat(l.b/ ivib)^ Additional Information: full citation , abstract , references , citings 
Publisher Site 

The number of different words expected on the basis of the urn model to appear in, for 
example, the first half of a text, is known to overestimate the observed number of 
different words. This paper examines the source of this overestimation bias. It is shown 
that this bias does not arise due to sentence-bound syntactic constraints, but that it is a 
direct consequence of topic cohesion in discourse. The nonrandom, clustered appearance 
of lexically specialized words, often the key words of the tex ... 



23 



Fast detection of communication patterns in distributed executions 
Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced 
Studies on Collaborative research CASCON '97 

Publisher: IBM Press 

Full text available: fjS?l pdf(4.21 MB ) Additional Information: full citation, abstract, references, ind e x te rms 
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Understanding distributed applications is a tedious and difficult task. Visualizations based 
on process-time diagrams are often used to obtain a better understanding of the execution 
of the application. The visualization tool we use is Poet; an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not 
provide the user with the desired overview of the application. In our experience, such tools 
display repeated occurrences of non-trivial commun ... 

24 Multiple Peak Alignment in Sequential Data Analysis: A Scale-Space-Based 
A pproach 

Weichuan Yu, Xiaoye Li, Junfeng Liu, Baolin Wu, Kenneth R. Williams, Hongyu Zhao 
July 2006 IEEE/ACM Transactions on Computational Biology and Bioinformatics 

(TCBB), Volume 3 Issue 3 
Publisher: IEEE Computer Society Press 

Full text available: ^] pdf(1.52 MB) Additional Information: full citation , abstract , references , index terms 

In this paper, we address the multiple peak alignment problem in sequential data analysis 
with an approach based on the Gaussian scale-space theory. We assume that multiple sets 
of detected peaks are the observed samples of a set of common peaks. We also assume 
that the locations of the observed peaks follow unimodal distributions (e.g., normal 
distribution) with their means equal to the corresponding locations of the common peaks 
and variances reflecting the extension of their variations. Under ... 

Keywords: Biomarker discovery, peak identification, multiple peak alignment, scale- 
space, prior information, energy minimization, parameter optimization. 
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25 Subgroup Discovery with CN2-SD Q 
Nada Lavrac, Branko Kavsek, Peter Flach, Ljupco Todorovski 

December 2004 The Journal of Machine Learning Research, volume 5 
Publisher: MIT Press 

Full text available- I P Pdf(435.37 KB) Additional Information: full citation, abstract, references, citings, Index 
terms 

This paper investigates how to adapt standard classification rule learning approaches to 
subgroup discovery. The goal of subgroup discovery is to find rules describing subsets of 
the population that are sufficiently large and statistically unusual. The paper presents a 
subgroup discovery algorithm, CN2-SD, developed by modifying parts of the CN2 
classification rule learner: its covering algorithm, search heuristic, probabilistic 
classification of instances, and evaluation measures. Experi .... 

26 Face recognition: A literature survey Q 

W. Zhao, R. Chellappa, P. J. Phillips, A. Rosenfeld 
December 2003 ACM Computing Surveys (CSUR), Volume 35 issue 4 

Publisher: ACM Press 

Full text available - IS Ddf(4 28 MB) Additional Information: full citation , abstract , references , citings , index 
. i^j-fcL-a - terms 

As one of the most successful applications of image analysis and understanding, face 
recognition has recently received significant attention, especially during the past several 
years. At least two reasons account for this trend: the first is the wide range of 
commercial and law enforcement applications, and the second is the availability of feasible 
technologies after 30 years of research. Even though current machine recognition systems 
have reached a certain level of maturity, their success is ... 

Keywords: Face recognition, person identification 

27 Al and computational lo g ic and ima g e analysis (Al): Facial emotion reco g nition by Q 
adaptive processing of tree structures 

^ Jia-Jun Wong, Siu-Yeung Cho 
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April 2006 Proceedings of the 2006 ACM symposium on Applied computing SAC '06 

Publisher: ACM Press 

Full text available: Q pdf(468.59 KB ) Additional Information: full citation, abstra ct , references, index terms 

We present an emotion recognition system based on a probabilistic approach to adaptive 
processing of Facial Emotion Tree Structures (FETS). FETS are made up of localized Gabor 
features related to the facial components according to the Facial Action Coding System. 
The proposed model is an extension of the probabilistic based recursive neural network 
model applying in face recognition by Cho and Wong [1], The robustness of the model in 
an emotion recognition system is evaluated by testing with kno ... 

Keywords: facial emotion tree structures, neural networks, probabilistic based neural 
networks, tree structures 



28 DBMiner: a system for data mining in relational databases and data warehouses 
Jiawei Han, Jenny Y. Chiang, Sonny Chee, Jianping Chen, Qing Chen, Shan Cheng, Wan 
Gong, Micheline Kamber, Krzysztof Koperski, Gang Liu, Yijun Lu, Nebojsa Stefanovic, Lara 
Winstone, Betty B. Xia, Osmar R. Zaiane, Shuhua Zhang, Hua Zhu 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced 

Studies on Collaborative research CASCON '97 
Publisher: IBM Press 

Full text available* fB pdf(2.80 .67. KB) AdG, ' t ' ona ' Information: full citat ion, abstract , references , citings , index 
" ''*'" " terms 

A data mining system, DBMiner, has been developed for interactive mining of multiple- 
level knowledge in large relational databases and data warehouses. The system 
implements a wide spectrum of data mining functions, including characterization, 
comparison, association, classification, prediction, and clustering. By incorporating several 
interesting data mining techniques, including OLAP and attribute-oriented induction, 
statistical analysis, progressive deepening for mining multiple-level knowled ... 

29 An integrated model of drilling vessel operations 
Susan E. Hoffman, Melba M. Crawford, James R. Wilson 

December 1983 Proceedings of the 15th conference on Winter simulation - Volume 1 

WSC '83 
Publisher: IEEE Press 

Full text available: fffl pdf(8Q2.06 KB) Additional Information: Ml citation, abstract, Meiences, citings, index 
. : term s 

A combined discrete-event/continuous/process-interaction simulation model has been 
developed to evaluate the effects of weather and supply-ship availability on off-shore 
drilling operations at a specified location and time of year with a specified drilling vessel. 
The continuous submodel includes: (a) autoregressive-moving average and transfer- 
function models to represent weather conditions; (b) a difference equation to monitor 
effective work time for the current operation on the drilling v ... 

30 Mining web logs to debug distant connectivity problems 
Emre Kidman, David A. Maltz, Moises Goldszmidt, John C. Piatt 

September 2006 Proceedings of the 2006 SIGCOMM workshop on Mining network data 
MineNet 06 

Publisher: ACM Press 

Full text available: ^ jpdfd 94.40 KB) Additional Information: full citation , abstract , references , index terms 

Content providers base their business on their ability to receive and answer requests from 
clients distributed across the Internet. Since disruptions in the flow of these requests 
directly translate into lost revenue, there is tremendous incentive to diagnose why some 
requests fail and prod the responsible parties into corrective action. However, a content 
provider has only limited visibility into the state of the Internet outside its domain. 
Instead, it must mine failure diagnoses from availabl ... 
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31 Industrial/ g ov e rnment track: Clinical and financial ou t comes analysi s with existin g Q 
<g> hospital patient records 

^ R. Bharat Rao, Sathyakama Sandilya, Radu Stefan Niculescu, Colin Germond, Harsha Rao 
August 2003 Proceedings of the ninth ACM SIGKDD international conference on 

Knowledge discovery and data mining KDD '03 
Publisher: ACM Press 

Full text available' df(188 40 KB) Additional Information: full citation , abstract , references , citings, index 
TS-^— 1 : terms 

Existing patient records are a valuable resource for automated outcomes analysis and 
knowledge discovery. However, key clinical data in these records is typically recorded in 
unstructured form as free text and images, and most structured clinical information is 
poorly organized. Time-consuming interpretation and analysis is required to convert these 
records into structured clinical data. Thus, only a tiny fraction of this resource is utilized. 
We present REMIND, a Bayesian Framework for Reliable ... . 

Keywords: Bayes Nets, HMMs, data mining, temporal reasoning 

32 Weakly supervised named entity transliteration and discovery from multilingual Q 

comparable corpora 
Alexandre Klementiev, Dan Roth 

July 2006 Proceedings of the 21st International Conference on Computational 
Linguistics and the 44th annual meeting of the ACL ACL '06 

Publisher: Association for Computational Linguistics 

Full text available: ^|pdf(1 88.24 KB) Additional Information: full citation , abstract , references 

Named Entity recognition (NER) is an important part of many natural language processing 
tasks. Current approaches often employ machine learning techniques and require 
supervised data. However, many languages lack such resources. This paper presents an 
(almost) unsupervised learning algorithm for automatic discovery of Named Entities (NEs) 
in a resource free language, given a bilingual corpora in which it is weakly temporally 
aligned with a resource rich language. NEs have similar time distributi ... 

33 Information processing in the context of medical care Q 
Valerie Florance, Gary Marchionini 

v July 1995 Proceedings of the 18th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '95 
Publisher: ACM Press 

Full text available: g pdf(608.92 KB ) Additional Information: full citation, refer en c es , citings, index terms 



34 A s so c ia t iv e Cl us t e ri n g for Expl or i n g De pendencies between Functional Genomics 
Data Sets 

Samuel Kaski, Janne Nikkila, Janne Sinkkonen, Leo Lahti, Juha E. A. Knuuttila, Christophe 
Roos 

July 2005 IEEE/ACM Transactions on Computational Biology and Bioinformatics 

(TCBB), Volume 2 Issue 3 
Publisher: IEEE Computer Society Press 

Full text available: ^| pdf(896.56 KB) Additional Information: full citation , abstract , references , index terms 

High-throughput genomic measurements, interpreted as cooccurring data samples from 
multiple sources, open up a fresh problem for machine learning: What is in common in the 
different data sets, that is, what kind of statistical dependencies are there between the 
paired samples from the different sets? We introduce a clustering algorithm for exploring 
the dependencies. Samples within each data set are grouped such that the dependencies 
between groups of different sets capture as much of pairwise d ... 

Keywords: Index Terms- Biology and genetics, clustering, contingency table analysis, 
machine learning, multivariate statistics. 
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35 Summarizing scientific articles: experiments with relevance and rhetorical status 
Simone Teufel, Marc Moens 

December 2002 Computational Linguistics, volume 28 issue 4 
Publisher: MIT Press 

p ii , , , ,, a At/ An a an isb\ Additional Information: full citation , abstract , references , citing s, index 
Full text available: TO pdf{424.69 KB) 

terms 

In this article we propose a strategy for the summarization of scientific articles that 
concentrates on the rhetorical status of statements in an article: Material for summaries is 
selected in such a way that summaries can highlight the new contribution of the source 
article and situate it with respect to earlier, work. We provide a gold standard for 
summaries of this kind consisting of a substantial corpus of conference articles in 
computational linguistics annotated with human judgments of the r ... 

36 Simulation metamodels 
Russell R. Barton 

December 1998 Proceedings of the 30th conference on Winter simulation WSC '98 
Publisher: IEEE Computer Society Press 

Full text available: g| pdf (79.38 KB) Additional Information: full citation, references, citings, index te rm s 



37 Semisupervised Learning for Molecular Profiling 

Cesare Furlanello, Maria Serafini, Stefano Merler, Giuseppe Jurman 
April 2005 IEEE/ACM Transactions on Computational Biology and Bioinformatics 

(TCBB), Volume 2 Issue 2 
Publisher: IEEE Computer Society Press 

Full text available: ^ pdf(1.09 MB) Additional Information: full citation , abstract, references , index terms 

Class prediction and feature selection are two learning tasks that are strictly paired in the 
search of molecular profiles from microarray data. Researchers have become aware how 
easy it is to incur a selection bias effect, and complex validation setups are required to 
avoid overly optimistic estimates of the predictive accuracy of the models and incorrect 
gene selections. This paper describes a semisupervised pattern discovery approach that 
uses the by-products of complete validation studies on ... 

Keywords: Machine learning, data mining, classifier design and evaluation, feature 
evaluation and selection, pattern analysis, clustering, similarity measures, biology and 
genetics, bioinformatics databases. 



38 Network measurement : Diag nosing network disrup tions with netwo r k-wide analysis Q 
^ Yiyi Huang, Nick Feamster, Anukool Lakhina, Jim (Jun) Xu 

v June 2007 Proceedings of the 2007 ACM SIGMETRICS international conference on 
Measurement and modeling of computer systems SIGMETRICS '07 

Publisher: ACM Press 

Full text available: ^] pdf(374.88 KB) Additional Information: full citation, abstract, references, index terms 

To maintain high availability in the face of changing network conditions, network operators 
must quickly detect, identify, and react to events that cause network disruptions. One way 
to accomplish this goal is to monitor routing dynamics, by analyzing routing update 
streams collected from routers. Existing monitoring approaches typically treat streams of 
routing updates from different routers as independent signals, and report only the "loud" 
events (i.e., events that involve large volume of ... 

Keywords: anomaly detection, network management, statistical inference 
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39 TextTiling: segmenting t ext in to multi-paragraph subtopic passages Q 
Marti A. Hearst 

March 1997 Computational Linguistics, volume 23 issue l 
Publisher: MIT Press 
Full text available 



^.pfflM6.M.B)..^ Additional Information: f ull ci t ation , a bs tract, references, citings 
Publisher Site 

TextTiling is a technique for subdividing texts into multi-paragraph units that represent 
passages, or subtopics. The discourse cues for identifying major subtopic shifts are 
patterns of lexical co-occurrence and distribution. The algorithm is fully implemented and 
is shown to produce segmentation that corresponds well to human judgments of the 
subtopic boundaries of 12 texts. Multi-paragraph subtopic segmentation should be useful 
for many text analysis tasks, including information retrieval and ... 

40 Logic design 

February 1973 Proceedings of the 1st annual computer science conference on 
Program information abstracts CWC '73 

Publisher: ACM Press 

Full text available: fj| pdf(25 7. 23 K B) Additional Information: full citation . . 
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41 Event detection from time series data 
Valery Guralnik, Jaideep Srivastava 

August 1999 Proceedings of the fifth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '99 

Publisher: ACM Press 

Full text available: ^] pdf (1. 01 MB ) Additional Information: full citation , references , citings, index terms 



42 The KDD process for extracting useful knowledge from volumes of data 
Usama Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth 
November 1996 Communications of the ACM, volume 39 issue n 

Publisher: ACM Press 

Full text available: ^ pdf(523.49 KB) Additional Information: full citation , references , citings, index terms 




43 Data mining (DM): A model for mining outliers from complex data sets Q 

#Hongwei Qi, Jue Wang 
March 2004 Proceedings of the 2004 ACM symposium on Applied computing SAC '04 
Publisher: ACM Press 

Full text available: ^|pdf(321.83 KB) Additional Information: full citation , abstract , references 

To solve the outlier mining problems where outliers are highly intermixed with normal 
data, a general Variance-based Outlier Mining Model (VOMM) is presented, in which the 
information of data is decomposed into normal and abnormal components according to 
their variances. With minimal loss of normal information in the VOMM, outliers are viewed 
as the top k samples holding maximal abnormal information in a dataset. And then, the 
principal curve that is a smooth nonparametric curve passing through ... 

Keywords: Outlier Mining, principal curve, stock market 



44 Spatio-temporal data management: Trajectory clustering: a partition-and-group 
framework 

^ Jae-Gil Lee, Jiawei Han, Kyu-Young Whang 

June 2007 Proceedings of the 2007 ACM SIGMOD international conference on 
Management of data SIGMOD '07 

Publisher: ACM Press 
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Existing trajectory clustering algorithms group similar trajectories as a whole, thus 
discovering common trajectories. Our key observation is that clustering trajectories as a 
whole could miss common sub-trajectories. Discovering common sub-trajectories is very 
useful in many applications, especially if we have regions of special interest for analysis. In 
this paper, we propose a new partition-and-group framework for clustering trajectories, 
which partitions a trajectory into a ... 

Keywords: MDL principle, density-based clustering, partition-and-group framework, 
trajectory clustering 



46 Bioinformatics — an introduction for computer scientists 
Jacques Cohen 

June 2004 ACM Computing Surveys (CSUR), Volume 36 issue 2 
Publisher: ACM Press 

Full text available: fg| pdf( 261 .56KB) Additional Information: full citation , abstract, references , citings, index 
10 ' terms 

The article aims to introduce computer scientists to the new field of bioinformatics. This 
area has arisen from the needs of biologists to utilize and help interpret the vast amounts 
of data that are constantly being gathered in genomic research---and its more recent 
counterparts, proteomics and functional genomics. The ultimate goal of bioinformatics is to 
develop in silico models that, will complement in vitro and in vivo biological experiments. 
The article provides a bird's eye view of the ... 

Keywords: DNA, Molecular cell biology, RNA and protein structure, alignments, cell 
simulation and modeling, computer, dynamic programming, hidden-Markov-models, 
microarray, parsing biological sequences, phylogenetic trees 



46 D a ta minin g to detect abnorma l b eh a v i or in aerospace data 
Jose M. Pena, Fazel Famili, Sylvain Letourneau 

August 2000 Proceedings of the sixth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '00 

Publisher: ACM Press 

Full text available: 1p lpdf(1 79.60 KB) Additional Information: full citation , references , index terms 



Keywords: data partitioning, machine learning, trend monitoring 



47 Modeling changing dependency structure in multivariate time series Q 

#Xiang Xuan, Kevin Murphy 
June 2007 Proceedings of the 24th international conference on Machine learning 

ICML '07 
Publisher: ACM Press 

Full text available: Q pdf(331.4Q KB) Additional Information: full citation , abstract , references 

We show how to apply the efficient Bayesian changepoint detection techniques of 
Fearnhead in the multivariate setting. We model the joint density of vector-valued 
observations using undirected Gaussian graphical models, whose structure we estimate. 
We show how we can exactly compute the MAP segmentation, as well as how to draw 
perfect samples from the posterior over segmentations, simultaneously accounting for 
uncertainty about the number and location of changepoints, as well as uncertainty a ... 

48 F eature Selection for Unsupervis e d Learnin g Q 
Jennifer G. Dy, Carla E. Brodley 

December 2004 The Journal of Machine Learning Research, volume 5 
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Publisher: MIT Press 

Full text available: Q pdf(725.21 KB) Additional Information: full citation , abstract , references , citings 

' In this paper, we identify two issues involved in developing an automated feature subset 
selection algorithm for unlabeled data: the need for finding the number of clusters in 
conjunction with feature selection, and the need for normalizing the bias of feature 
selection criteria with respect to dimension. We explore the feature selection problem and 
these issues through FSSEM (Feature Subset Selection using Expectation-Maximization 
(EM) clustering) and through two different performance criteria ... 

49 Research papers: mining biological and medical data: Subsequence matching on Q 
structured time series data 

Huanmei Wu, Betty Salzberg, Gregory C Sharp, Steve B Jiang, Hiroki Shirato, David Kaeli 
June 2005 Proceedings of the 2005 ACM SIGMOD international conference on 

Management of data SIGMOD '05 
Publisher: ACM Press 

Full text available: ^[ pdf(930.08 KB) Additional Information: full citation, abstrac t, re f ere n c e s 

Subsequence matching in time series databases is a useful technique, with applications in 
pattern matching, prediction, and rule discovery. Internal structure within the time series 
data can be used to improve these tasks, and provide important insight into the problem 
domain. This paper introduces our research effort in using the internal structure of a time 
series directly in the matching process. This idea is applied to the problem domain of 
respiratory motion data in cancer radiation treatme ... 

50 Unsupervised learning of the morphology of a natural language Q 
John Goldsmith 

June 2001 Computational Linguistics, Volume 27 issue 2 

Publisher: MIT Press 

Full text available:^ ^onS 

BJ11M9MdL^ Additional Information: full citation , abstract , references, citings < 

Publi sher Site 

This study reports the results of using minimum description length (MDL) analysis to 
model unsupervised learning of the morphological segmentation of European languages, 
using corpora ranging in size from 5,000 words to 500,000 words. We develop a set of 
heuristics that rapidly develop a probabilistic morphological grammar, and use MDL as our 
primary tool to determine whether the modifications proposed by the heuristics will be 
adopted or not. The resulting grammar matches well the analysis that 

51 Research track poster: LIPED: HMM-based life profiles for adaptive event detection Q 
Chien Chin Chen, Meng Chang Chen, Ming-Syan Chen 

August 2005 Proceeding of the eleventh ACM SIGKDD international conference on 
Knowledge discovery in data mining KDD '05 

Publisher: ACM Press 

Full text available' tSl df(878 45 KB) Add ' tiona ' Information: ful l c ita tion , abstrac t, referen ces , citings, index 
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In this paper, the proposed LIPED (Life Profile based Event Detection) employs the 
concept of life profiles to predict the activeness of event for effective event detection. A 
group of events with similar activeness patterns shares a life profile, modeled by a hidden 
Markov model. Considering the burst-and-diverse property of events, LIPED identifies the 
activeness status of event. As a result, LIPED balances the clustering precision and recall 
to achieve better Fl scores than other well known a ... 

Keywords: clustering, event detection, hidden markov models, life profiles 

52 Data streams (PS): Quality-driven evaluation of trigger conditions on streaming time Q 
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March 2005 Proceedings of the 2005 ACM symposium on Applied computing SAC '05 
Publisher: ACM Press 

Full text available: ^|pdf(151.13 KB) Additional Information: full citation , abstract, references 

For many applications, it is important to evaluate trigger conditions on time series 
streams. In a resource constrained environment, users' needs should ultimately decide 
how the evaluation system balances the competing factors such as evaluation speed, 
result precision, and load shedding level. This paper presents a basic framework for 
evaluation algorithms that takes user-specified quality requirements into consideration. 
Three optimization algorithms, each under a different set of quality req ... 

53 Think globally, f i t l o c al ly: un s u pe rvised learning of low dimensional manifolds Q 
Lawrence K. Saul, Sam T. Roweis 

December 2003 The Journal of Machine Learning Research, Volume 4 
Publisher: MIT Press 

Full text available" fB odf(2 91 MB) Additional Information: full citation , abstract , references , citings, index 
. [£] • terms 

The problem of dimensionality reduction arises in many fields of information processing, 
including machine learning, data compression, scientific visualization, pattern recognition, 
and neural computation. Here we describe locally linear embedding (LLE), an unsupervised 
learning algorithm that computes low dimensional, neighborhood preserving embeddings 
of high dimensional data. The data, assumed to be sampled from an underlying manifold, 
are mapped into a single global coordinate system of lowe ... 

54 What types of events provide the strongest evidence that the stock market is affected Q 
by company specific news? 

Calum Robertson, Shlomo Geva, Rodney Wolff 

November 2006 Proceedings of the fifth Australasian conference on Data mining and 
analystics - Volume 61 AusDM '06 

Publisher: Australian Computer Society, Inc. 

Full text available: Qpdf(508.13 KB) Additional Information: futl citation , abstract , refe rences 

The efficient market hypothesis states that an efficient market immediately incorporates 
all available information into the price of the traded entity. It is well established that the 
stock market is not an efficient market as it consists of numerous traders with differing 
strategies and interpretations of information. However there is substantial evidence to 
suggest that the stock market does incorporate new information into prices. Unfortunately 
little research has focussed on the high freq ... 

Keywords: market reaction, news, return, stock market, volatility 



55 Essential Latent Knowledge for Protein-Protein Interactions: Analysis by an 
Unsupervised Learning Approach 
Hiroshi Mamitsuka 

April 2005 IEEE/ACM Transactions on Computational Biology and Bioinformatics 

(TCBB), Volume 2 Issue 2 
Publisher: IEEE Computer Society Press 

Full text available: ^ pdf(1,25 MB) Additional Information: full citation , abstract , references , index terms 

Protein-protein interactions play a number of central roles in many" cellular functions, 
including DNA replication, transcription and translation, signal transduction, and metabolic 
pathways. A recent increase in the number of protein-protein interactions has made 
predicting unknown protein-protein interactions important for the understanding of living 
cells. However, the protein-protein interactions experimentally obtained so far are often 
incomplete and contradictory and, consequently, existing ... 

Keywords: Biology and genetics, machine learning, data mining, mining methods and 
algorithms. 
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56 Learning fixed-dimension linear thresholds from fragmented data Q 

#Paul W. Goldberg 
July 1999 Proceedings of the twelfth annual conference on Computational learning 
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57 Currency exchange rate forecasting from news headlines 
Desh Peramunetilleke, Raymond K. Wong 

January 2002 Australian Computer Science Communications , Proceedings of the 13th 

Australasian database conference - Volume 5 ADC '02, volume 24 issue 2 
Publisher: Australian Computer Society, Inc., IEEE Computer Society Press 
Full text available: ^jpdf(797.48 KB) Additional Information: full citation , abstract , references , index terms 

We investigate how money market news headlines can be used to forecast intraday 
currency exchange rate movements. The innovation of the approach is that, unlike 
analysis based on quantifiable information, the forecasts are produced from text describing 
the current status of world financial markets, as well as political and general economic 
news. In contrast to numeric time series data textual data contains not only the effect 
(e.g., the dollar rises against the Deutschmark) but also the possible ... 

Keywords: data mining, foreign exchange, prediction 



58 MEDCAT: an APL program for medical diagnosis, consultation, and teaching 
W. D. Hagamen, Martin Gardy, Gregory Bell, Edwin Rekosh, Steven Zatz 
May 1985 ACM SIGAPL APL Quote Quad , Proceedings of the international 

conference on APL: APL and the future APL '85, Volume 15 issue 4 
Publisher: ACM Press 

Full text available: Q pdf(794.80 KB^ Additional Information: ML^atjon, abstract, references, citings, index 

This is a description of MEDCAT, a computer program which makes diagnoses, explains 
each step in its reasoning in response to questions, increases its knowledge and reasoning 
ability by conversing with expert physicians, and uses its logical and communicative skills 
to help and evaluate medical students in the proper approach to medical diagnosis. The 
mechanism for each of these features is discussed. MEDCAT is coded in APL and 
implemented on a 68000 based microcomputer. 

59 Research track paper: Detection of emerging space-time clusters 
Daniel B. Neill, Andrew W. Moore, Maheshkumar Sabhnani, Kenny Daniel 
August 2005 Proceeding of the eleventh ACM SIGKDD international conference on 

Knowledge discovery in data mining KDD '05 
Publisher: ACM Press 
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We propose a new class of spatio-temporal cluster detection methods designed for the 
rapid detection of emerging space-time clusters. We focus on the motivating application of 
prospective disease surveillance: detecting space-time clusters of disease cases resulting 
from an emerging disease outbreak. Automatic, real-time detection of outbreaks can 
enable rapid epidemiological response, potentially reducing rates of morbidity and 
mortality. Building on the prior work on spatial and space-time sea ... 

Keywords: biosurveillance, cluster detection, space-time scan statistics 
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Vessel segmentation algorithms are the critical components of circulatory blood vessel 
analysis systems. We present a survey of vessel extraction techniques and algorithms; We 
put the various vessel extraction approaches and techniques in perspective by means of a 
classification of the existing research. While we have mainly targeted the extraction of 
blood vessels, neurosvascular structure in particular, we have also reviewed some of the 
segmentation methods for the tubular objects that show ... 

Keywords: Magnetic resonance angiography, X-ray angiography, medical imaging, 
neurovascular, vessel extraction 
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